library(ggplot2)
library(dplyr)
library(ggmap)
library(tidyr)
library(cluster)
library(lubridate)
library(rgl)
library(maptools)
library(ggpubr)
library(viridis)
library(tidyverse)
library(scatterplot3d)
library(factoextra)
library(fpc)
library(NbClust)
library(reshape2)
library(scales)
library(plyr)
library(caret)
library(ROCR)
library(pROC)
library(rlist)
library(Hmisc)
library(corrplot)
library(ggcorrplot)
library(DMwR)
library(ggthemes)
library(e1071)
library(maps)
library(RColorBrewer)
library(rgl)
model_dir = "models"
data_dir = "data"
map_dir = "map"
saved_maps = list.files(map_dir)
### Loading data
data=read.csv(paste(data_dir,"MCI_2014_to_2017.csv",sep="/"), header = TRUE, sep = ",")
### Loading map
for(file in saved_maps) {
load(paste(map_dir,file,sep="/"))
}
The second assignment for the York Machine Learning course, Machine Learning in Business Context was to explore unsupervised machine learning algorithms, specifically clustering. We chose a dataset from the Toronto Police Sercive Public Safety Data Portal, MCI 2014 to 2017. This report follows the structures laid out in CRISP-DM methodology.
The GitHub repository for all the source code is located at the following link: link here.
The RShiny app is located at the following link: link here.
The Toronto Police Service is the police force that serves the Greater Toronto Area. It is the largest municipal police force in Canada and the third largest police force in Canada. They are a taxpayer-funded service, ranking as second for the government of Toronto’s budgetary expenses. There is always
The objective of this model is to cluster crimes to determine which areas of Toronto have the most levels of crime, overall and for different Major Crime Indicators. This would enable the Toronto Police to most effectively allocate their officers and specialists to the areas that require them the most. The hope is that this would be an effective way to lower crime rates and enable more cases to be solved, all without spending more money.
There are some ethical implications of using crime data. There are many avenues for bias to enter the data set. Police services around North America have come under increased scrutiny in recent years for racist policing. Some police policies or laws inherently disadvantage certain groups of people, which would create bias in the data. This means that conclusions drawn from a machine learning model based on biased data would create biased results. The conclusions should be looked at with other data, such as demographic data, and supplementary information, such as social considerations.
The data set was provided courtesy of the Toronto Police Service Open Data Portal. It is usable in accordance with the Open Government License - Ontario.
The data concerns all Major Crime Indicators (MCI) occurrences in Toronto by reported date, between 2014 and 2017. The MCIs are Assault, Break and Enter, Auto Theft, and Theft Over (excludes Sexual Assaults). Locations in the data set are approximate, as they have been deliberately offset to the nearest road intersection to protect the privacy of involved individuals.
There are 29 columns with 131,073 observations. However, 4 of these columns are duplicates, bringing us to 25 columns. Most of the columns
There are both numeric and categorical attributes and are further divided by date-type data and crime-type data.
The following table shows the numeric attributes:
| Attribute | Description |
|---|---|
| … | … |
While this table shows the categorical attributes:
| Attribute | Description |
|---|---|
| … | … |
Visualization crime level for all neighbourhoods.
shpfile <- paste(data_dir,"NEIGHBORHOODS_WGS84_2.shp",sep="/")
sh <- readShapePoly(shpfile)
## Warning: readShapePoly is deprecated; use rgdal::readOGR or sf::st_read
sh@data$AREA_S_CD <- as.integer(sh@data$AREA_S_CD)
total_offence_cnt_table = data %>% group_by(Hood_ID) %>% dplyr::summarise(offence_cnt = n())
hood_total_offence_cnt_table = merge(total_offence_cnt_table,sh@data,by.x='Hood_ID',by.y='AREA_S_CD')
points_offense_cnt <- fortify(sh, region = 'AREA_S_CD')
points_offense_cnt <- merge(points_offense_cnt, hood_total_offence_cnt_table, by.x='id', by.y='Hood_ID', all.x=TRUE)
torontoMap + geom_polygon(aes(x=long,y=lat, group=group, fill=offence_cnt), data=points_offense_cnt, color='black') +
scale_fill_distiller(palette='Spectral') + scale_alpha(range=c(0.5,0.5))
Visualization Toronto Crimes Heatmap across neighboughhoods
base_size <- 9
heat_group <- group_by(toronto, Neighbourhood, offence)
heat_count <- dplyr::summarise(heat_group,Total = n())
heat_count$Neighbourhood <- with(heat_count,reorder(Neighbourhood,Total))
heat_count.m <- melt(heat_count)
## Using Neighbourhood, offence as id variables
heat_count.m <- ddply(heat_count.m, .(variable), transform,rescale = rescale(value))
ggplot(heat_count.m, aes(Neighbourhood, offence)) +
geom_tile(aes(fill = rescale),colour = "white") +
ggtitle("Toronto Criminal Heatmap") +
scale_fill_gradient(low = "lightblue",high = "darkblue") +
theme_grey(base_size = base_size) +
labs(x = "", y = "") +
scale_x_discrete(expand = c(0, 0)) +
scale_y_discrete(expand = c(0, 0)) +
theme_minimal() +
theme(legend.position = "none",axis.ticks = element_blank(), axis.text.x = element_text(size = base_size *0.8, angle = 330, hjust = 0, colour = "grey50"))
We also can visualize some interesting information from the dataset e.g.: Finding the top dangerous or top safe zones, neighbourhoods in Toronto based on all MCI or on a specific MCI. The following charts display top 5 dangerous and top 5 safe neighbourhoods in Toronto on total MCI. For more options on other top neighbourhoods/divisions on total MCI or for a specific MCI, please explore on our application.
location_group <- group_by(toronto, Neighbourhood)
crime_by_location <- dplyr::summarise(location_group, n=n())
crime_dangerous <- crime_by_location[order(crime_by_location$n, decreasing = TRUE), ]
crime_dangerous_top <- head(crime_dangerous, 5)
ggplot(aes(x = reorder(Neighbourhood, n), y = n, fill = reorder(Neighbourhood, n)), data = crime_dangerous_top) +
geom_bar(stat = 'identity', width = 0.6) +
ggtitle("Top 5 dangerous neighbourhoods in Toronto") +
coord_flip() +
xlab('Zone') +
ylab('Number of Occurrences') +
scale_fill_brewer(palette = "Reds") +
theme(plot.title = element_text(size = 16),
axis.title = element_text(size = 12, face = "bold"),
legend.position="none")
crime_safe <- crime_by_location[order(crime_by_location$n, decreasing = FALSE), ]
crime_safe_top <- head(crime_safe, 5)
ggplot(aes(x = reorder(Neighbourhood, -n), y = n, fill = reorder(Neighbourhood, -n)), data = crime_safe_top) +
geom_bar(stat = 'identity', width = 0.6) +
ggtitle("Top 5 safe neighbourhoods in Toronto") +
coord_flip() +
xlab('Zone') +
ylab('Number of Occurrences') +
scale_fill_brewer(palette = "Greens") +
theme(plot.title = element_text(size = 16),
axis.title = element_text(size = 12, face = "bold"),
legend.position="none")
### (3) Data outliers It seems like someone reported crimes over 40 years ago. But since nothing stops people from doing this, we are not going to treat these as outliers.
data$occurrencedate <- ymd(gsub("(.*)T.*", "\\1", data$occurrencedate))
data$reporteddate <- ymd(gsub("(.*)T.*", "\\1", data$reporteddate))
data[which(data$occurrencedate < as.POSIXct("1970-01-01")),]
## ï..X Y Index_ event_unique_id occurrencedate
## 1257 -79.53082 43.64304 147 GO-2015636294 1966-06-09
## 11889 -79.44592 43.68811 5156 GO-20143212768 1968-01-01
## reporteddate premisetype ucr_code ucr_ext offence reportedyear
## 1257 2015-04-17 Apartment 1430 100 Assault 2015
## 11889 2014-10-31 House 1430 100 Assault 2014
## reportedmonth reportedday reporteddayofyear reporteddayofweek
## 1257 April 17 107 Friday
## 11889 October 31 304 Friday
## reportedhour occurrenceyear occurrencemonth occurrenceday
## 1257 15 NA NA
## 11889 11 NA NA
## occurrencedayofyear occurrencedayofweek occurrencehour MCI
## 1257 NA 0 Assault
## 11889 NA 9 Assault
## Division Hood_ID Neighbourhood Lat Long
## 1257 D22 14 Islington-City Centre West (14) 43.64304 -79.53082
## 11889 D13 107 Oakwood Village (107) 43.68811 -79.44592
## FID
## 1257 257
## 11889 5889
The rest of the data is mostly categorical in their nature, so no concerns need to be addressed in terms of data outliers here.
The following code documents the preprocessing steps idendified for this dataset.
#reload the data
data <- read.csv(paste(data_dir,"MCI_2014_to_2017.csv",sep="/"), header = TRUE, sep = ",")
First, we want to remove any duplicate data - rows or columns. Some events have duplicated event IDs and should be removed. We also have duplicate columns for X/Y and Lat/Long, which should be removed. We are don’t use the UCR codes or the ID numbers, so they’re also removed.
#remove duplicate rows/entries
data <- subset(data, !duplicated(data$event_unique_id))
#remove columns that aren't used/duplicates
data <- data[, !colnames(data) %in% c("?..X","Y","Index_","event_unique_id","ucr_code","ucr_ext","FID")]
Next, we format the dates. There are garbage time values at the end of the dates, which are removed. The other date values are also changed into appropriate date/time values. Whitespace is also present in the day of week columns, so that is trimmed.
#formatting dates - remove garbage time values at the end
data$occurrencedate <- gsub("(.*)T.*", "\\1", data$occurrencedate)
data$reporteddate <- gsub("(.*)T.*", "\\1", data$reporteddate)
data$occurrencetime = ymd_h(paste(data$occurrencedate,data$occurrencehour,sep = " "), tz="EST")
data$reportedtime = ymd_h(paste(data$reporteddate,data$reportedhour,sep = " "), tz="EST")
data$occurrencedate = ymd(data$occurrencedate)
data$reporteddate = ymd(data$reporteddate)
#removing whitespace from day of week
data$occurrencedayofweek <- as.factor(trimws(data$occurrencedayofweek, "b"))
data$reporteddayofweek <- as.factor(trimws(data$reporteddayofweek, "b"))
We also convert the month/day of week from string representation to integer representation:
data$reportedmonth = month(data$reportedtime)
data$reporteddayofweek = wday(data$reportedtime)
data$occurrencemonth = month(data$occurrencetime)
data$occurrencedayofweek = wday(data$occurrencetime)
Now let’s take a look at the missing data:
#missing data
colSums(is.na(data))
## ï..X occurrencedate reporteddate
## 0 0 0
## premisetype offence reportedyear
## 0 0 0
## reportedmonth reportedday reporteddayofyear
## 0 0 0
## reporteddayofweek reportedhour occurrenceyear
## 0 0 32
## occurrencemonth occurrenceday occurrencedayofyear
## 0 32 32
## occurrencedayofweek occurrencehour MCI
## 0 0 0
## Division Hood_ID Neighbourhood
## 0 0 0
## Lat Long occurrencetime
## 0 0 0
## reportedtime
## 0
NAdata <- unique (unlist (lapply (data, function (x) which (is.na (x)))))
Rows with NA data:
NAdata
## [1] 921 922 923 964 986 1036 1037 1086 1087 1088 1157
## [12] 1158 1159 1211 1228 1508 1509 1510 1689 1690 1691 1692
## [23] 3330 3331 10372 10373 13858 13859 13860 13861 95190 98356
Occurence dates for rows with NA data:
data$occurrencedate[NAdata]
## [1] "1989-01-01" "1998-01-01" "1991-01-01" "1998-01-01" "1999-10-01"
## [6] "1998-01-01" "1996-01-01" "1966-06-09" "1980-04-24" "1970-01-01"
## [11] "1999-01-01" "1996-01-01" "1992-12-21" "1996-01-31" "1998-06-01"
## [16] "1974-01-01" "1995-01-01" "1990-01-01" "1988-01-01" "1988-01-01"
## [21] "1973-08-31" "1999-03-24" "1999-01-01" "1982-03-13" "1968-01-01"
## [26] "1995-01-01" "1994-01-01" "1989-01-01" "1987-01-01" "1989-11-20"
## [31] "1996-01-01" "1978-04-10"
We can see that there are 32 missing values, all in occurrence date related columns. Upon further inspection, these are all the same rows. We can also see that the occurrence date value is correct, so these date type columns can have their missing values imputed.
#imputing occurence dates from occurence date field
data$occurrenceyear[NAdata] <- year(data$occurrencedate[NAdata])
data$occurrencemonth[NAdata] <- month(data$occurrencedate[NAdata], label = TRUE, abbr = FALSE)
data$occurrenceday[NAdata] <- day(data$occurrencedate[NAdata])
data$occurrencedayofweek[NAdata] <- wday(data$occurrencedate[NAdata], label = TRUE, abbr = FALSE)
data$occurrencedayofyear[NAdata] <- yday(data$occurrencedate[NAdata])
Now we replace the space in the strings with an underscore for easier processing:
#replace space in string
data$offence <- gsub("\\s", "_", data$offence)
data$MCI <- gsub("\\s", "_", data$MCI)
Next, all columns are converted into factors except Lat, Long, reportedtime, and occurrencetime. Unused factor levels are also dropped (resulted from missing values).
#change things to factors
for(col in c("offence","MCI","Division","Hood_ID")) {
data[,col] = as.factor(data[,col])
}
#if we use the gower distance and daisy() function, the following metrics can be considered to converted to ordered factor; but since gower distance turns out to be too expensive for large dataset, we have treated the following as normal factors as well!
for(col in c("reportedyear","reportedmonth","reportedday","reporteddayofyear","reporteddayofweek",
"reportedhour","occurrenceyear","occurrencemonth","occurrenceday","occurrencedayofyear",
"occurrencedayofweek","occurrencehour")) {
data[,col] = factor(data[,col])
}
#drop unused factor levels
for(col in names(data)) {
if(is.factor(data[,col])) {
data[,col] = droplevels(data[,col])
}
}
Finally, we cherck for missing data one last time:
# Check missing values one last time
colSums(is.na(data))
## ï..X occurrencedate reporteddate
## 0 0 0
## premisetype offence reportedyear
## 0 0 0
## reportedmonth reportedday reporteddayofyear
## 0 0 0
## reporteddayofweek reportedhour occurrenceyear
## 0 0 0
## occurrencemonth occurrenceday occurrencedayofyear
## 0 0 0
## occurrencedayofweek occurrencehour MCI
## 0 0 0
## Division Hood_ID Neighbourhood
## 0 0 0
## Lat Long occurrencetime
## 0 0 0
## reportedtime
## 0
With the data prepared, we can now start looking at models.
First, we can look at clustering crime by neighborhood. We need to coerce the data into a clusterable table, sorted by MCI and neighborhood.
# Neighbourhood first
#first, coerce the data into a table that can be clustered - we aren't interested in the occurence date at this point
#courtesy of Susan Li - https://datascienceplus.com/exploring-clustering-and-mapping-torontos-crimes/
bygroup <- group_by(data, MCI, Hood_ID)
groups <- dplyr::summarise(bygroup, n=n())
groups <- groups[c("Hood_ID", "MCI", "n")]
hood <- as.data.frame(spread(groups, key=MCI, value=n))
hood_id = as.integer(hood[,"Hood_ID"])
hood = hood[,-1]
head(hood)
## Assault Auto_Theft Break_and_Enter Robbery Theft_Over
## 1 925 1091 501 243 187
## 2 867 203 132 302 14
## 3 203 59 84 73 10
## 4 233 74 61 57 4
## 5 206 53 56 66 4
## 6 389 175 144 73 19
Then we can use this table to perform k-means clustering. First, we need to normalize the data and determine the number of clusters. To determine the number of clusters, we simply plot the within-cluster sum-of-squares and pick a number after inspecting this plot.
hood <- scale(hood)
#determine number of clusters
wssplot <- function(data, nc=15, seed=1234) {
wss <- (nrow(data)-1)*sum(apply(data,2,var))
for (i in 2:nc) {
set.seed(seed)
wss[i] <- sum(kmeans(data, centers=i)$withinss)
}
plot(1:nc, wss, type="b", xlab="Number of Clusters",
ylab="Within groups sum of squares")
}
#we can see there's an elbow around 3 clusters
wssplot(hood, nc=15)
Now it seems number 3 is a good choice of the number of clusters. We then proceed to perform K-means clustering on the data-set. We have followed this tutorial(https://www.datanovia.com/en/lessons/cluster-validation-statistics-must-know-methods/) to evaluate the quality of clustering results. Specifically, we have used Silhouette width as a quantitative measures to evaluate the clustering quality. As we will see in the Silhouette plot, both in cluster #2 and cluster #3 have data points where the Silhouette width falls to negative, indicating a not so ideal clustering. This can also be observed in the 1st plot where we see that cluster #2 and cluster #3 is quite close to each other.
# K-means clustering
km.res <- eclust(hood, "kmeans", k = 3, graph = T)
fviz_silhouette(km.res, palette = "jco", ggtheme = theme_classic())
## cluster size ave.sil.width
## 1 1 10 0.01
## 2 2 37 0.21
## 3 3 93 0.59
Let’s look at the cluster means from the outputs to see what we can learn.
# k-means results
km.res
## K-means clustering with 3 clusters of sizes 10, 37, 93
##
## Cluster means:
## Assault Auto_Theft Break_and_Enter Robbery Theft_Over
## 1 2.5319589 1.6064456 2.5015651 2.3608452 2.9465633
## 2 0.5229260 0.3375853 0.6012997 0.5174595 0.2914680
## 3 -0.4802995 -0.3070442 -0.5082122 -0.4597253 -0.4327952
##
## Clustering vector:
## [1] 1 2 3 3 3 3 3 3 3 3 3 3 3 1 3 3 2 3 3 3 2 3 3 2 2 2 1 3 3 3 2 3 3 3 3
## [36] 3 3 3 2 2 3 3 3 3 2 3 2 3 3 2 2 3 3 3 3 3 3 3 3 3 3 2 3 3 3 3 3 3 3 2
## [71] 3 3 1 2 1 1 1 1 2 3 3 2 3 3 3 2 3 3 3 3 3 3 2 3 1 3 3 2 3 3 3 3 3 3 3
## [106] 3 3 3 3 3 2 3 2 3 3 3 2 3 2 2 2 2 3 2 3 2 2 2 3 2 2 2 3 3 3 2 1 2 3 3
##
## Within cluster sum of squares by cluster:
## [1] 153.40984 68.48510 45.97407
## (between_SS / total_SS = 61.5 %)
##
## Available components:
##
## [1] "cluster" "centers" "totss" "withinss"
## [5] "tot.withinss" "betweenss" "size" "iter"
## [9] "ifault" "clust_plot" "silinfo" "nbclust"
## [13] "data"
We can see that cluster 3 has lower than average crime. It has 93 neighborhoods, meaning that the majority of Toronto is quite safe in terms of the number of incidents occurring. Cluster 2 has slightly above average crime numbers, and it only has 37 neighborhoods. 10 neighborhoods have much higher than average crime numbers, which is seen in cluster 3. In some way, we can interpret the results as such that crime is concentrated in these small pockets of Toronto.
As an interesting activity, we also looking at a 3D representation of the clustering results. With 3 principal components, more variations should be captured that as in the case in 2 principal components. With a third dimension, we can see that there is more of a spread than initially can be seen simply in two dimensions.
#3D plot
pc <-princomp(hood, cor=TRUE, scores=TRUE)
plot3d(pc$scores[,1:3], col=km.res$cluster, main="k-means clusters")
Next, we can look at the neighborhoods from a hierarchical clustering approach. Again, we need to determine how many clusters we want to have. For hierarchical clustering, we look at the total number of observations that end up in different clusters with different configurations(in this case, the configuration is the number of clusters).
counts <- sapply(2:6, function(ncl) eclust(hood, "hclust", k = ncl, hc_metric = "euclidean", hc_method = "ward.D2")$size)
names(counts) <- 2:6
counts
## $`2`
## [1] 10 130
##
## $`3`
## [1] 10 44 86
##
## $`4`
## [1] 1 44 86 9
##
## $`5`
## [1] 1 44 86 7 2
##
## $`6`
## [1] 1 15 86 7 29 2
Still, we can see that 3 clusters are not bad if we consider only the cluster size as the criteria for choosing the number of clusters. We don’t want to split the neighborhoods into clusters with a small number of neighborhoods.
Now that we have chosen the number of clusters, we proceed to the clustering process. As previously in k-means, we look at Silhouette width as a quantitative measure to evaluate the clustering quality. To visually inspect the clusters, we plot dendrograms.
# Hierarchical clustering
hc.res <- eclust(hood, "hclust", k = 3, hc_metric = "euclidean",
hc_method = "ward.D2")
# Visualize dendrograms
fviz_dend(hc.res, show_labels = FALSE,
palette = "jco", as.ggplot = TRUE)
fviz_silhouette(hc.res, palette = "jco", ggtheme = theme_classic())
## cluster size ave.sil.width
## 1 1 10 0.04
## 2 2 44 0.16
## 3 3 86 0.61
It seems like we have reached the same conlusion as the k-means algorithms: cluster #1 and cluster #2 does not have large gap. The following codes plot the clustering results on the map. We also plot the offence count in each neiborhood and it can be seen that the 2nd plot is highly correlated with the 1st plot (which makes sense). Map with kMean Clustering v.s Manual Cluster Map
cluster_ids <- km.res$cluster
hood_ids_and_cluster_ids <- data.frame(cbind(hood_id,cluster_ids))
hood_ids_and_cluster_ids$cluster_ids = as.factor(hood_ids_and_cluster_ids$cluster_ids)
shpfile <- paste(data_dir,"NEIGHBORHOODS_WGS84_2.shp",sep="/")
sh <- readShapePoly(shpfile)
## Warning: readShapePoly is deprecated; use rgdal::readOGR or sf::st_read
sh@data$AREA_S_CD <- as.integer(sh@data$AREA_S_CD)
hood_name_and_cluster_ids = merge(hood_ids_and_cluster_ids,sh@data,by.x='hood_id',by.y='AREA_S_CD')
points_clustering <- fortify(sh, region = 'AREA_S_CD')
points_clustering <- merge(points_clustering, hood_name_and_cluster_ids, by.x='id', by.y='hood_id', all.x=TRUE)
p1 = torontoMap + geom_polygon(aes(x=long,y=lat, group=group, fill=cluster_ids), data=points_clustering, color='black') +
scale_fill_brewer(palette = "Set2")
p2 = torontoMap + geom_polygon(aes(x=long,y=lat, group=group, fill=offence_cnt), data=points_offense_cnt, color='black') +
scale_fill_distiller(palette='Spectral') + scale_alpha(range=c(0.5,0.5))
p3 = ggarrange(p1, p2, ncol = 2, nrow = 1, common.legend = F)
print(p3)
Map with Hierarchical Clustering v.s Manual Cluster Map
cluster_ids <- hc.res$cluster
hood_ids_and_cluster_ids <- data.frame(cbind(hood_id,cluster_ids))
hood_ids_and_cluster_ids$cluster_ids = as.factor(hood_ids_and_cluster_ids$cluster_ids)
shpfile <- paste(data_dir,"NEIGHBORHOODS_WGS84_2.shp",sep="/")
sh <- readShapePoly(shpfile)
## Warning: readShapePoly is deprecated; use rgdal::readOGR or sf::st_read
sh@data$AREA_S_CD <- as.integer(sh@data$AREA_S_CD)
hood_name_and_cluster_ids = merge(hood_ids_and_cluster_ids,sh@data,by.x='hood_id',by.y='AREA_S_CD')
points_clustering <- fortify(sh, region = 'AREA_S_CD')
points_clustering <- merge(points_clustering, hood_name_and_cluster_ids, by.x='id', by.y='hood_id', all.x=TRUE)
p1 = torontoMap + geom_polygon(aes(x=long,y=lat, group=group, fill=cluster_ids), data=points_clustering, color='black') +
scale_fill_brewer(palette = "Set2")
p2 = torontoMap + geom_polygon(aes(x=long,y=lat, group=group, fill=offence_cnt), data=points_offense_cnt, color='black') +
scale_fill_distiller(palette='Spectral') + scale_alpha(range=c(0.5,0.5))
p3 = ggarrange(p1, p2, ncol = 2, nrow = 1, common.legend = F)
print(p3)
### (2) Clustering strategy II. We can perform clustering on lat/long to look at natural crime hotspots. We can compare this to a manual cluster like the Toronto police divisions.
latlong <- data[, colnames(data) %in% c("Lat", "Long")]
# k-means
# 34 divisions in Toronto
k.means.fit <- kmeans(latlong, 34)
torclus <- as.data.frame(k.means.fit$centers)
torclus$size <- k.means.fit$size
latlongclus <- latlong
latlongclus$cluster <- as.factor(k.means.fit$cluster)
tormap <- get_map(location =c(left=-79.8129, bottom=43.4544, right=-78.9011, top=43.9132))
## Warning: bounding box given to google - spatial extent only approximate.
## converting bounding box to center/zoom specification. (experimental)
## Source : https://maps.googleapis.com/maps/api/staticmap?center=43.6838,-79.357&zoom=10&size=640x640&scale=2&maptype=terrain&language=en-EN
#plot each incident, color-coded by cluster
ggmap(tormap) +
geom_point( data= latlongclus, aes(x=Long[], y=Lat[], color= as.factor(cluster))) +
theme_void() + coord_map()
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
#plot bubble map with cluster centroids, size/ color determined by the number of incidents in each cluster
ggmap(tormap) +
geom_point( data= torclus, aes(x=Long[], y=Lat[], size=size)) +
theme_void() + coord_map()
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
We can see how the clusters look with both the incidents plotted as well as the centroids. Next, we can perform k-means clustering on the natural crime hotspot clusters and on the Toronto Police divisions to see how they compare to one another.
#coerce data for Hotspots
data2 <- data
data2$cluster <- k.means.fit$cluster
bygroup <- group_by(data2, MCI, cluster)
groups <- dplyr::summarise(bygroup, n=n())
groups <- groups[c("cluster", "MCI", "n")]
hotspot <- data.frame(spread(groups, key=MCI, value=n))
hotspot <- hotspot[, -1]
hotspot = data.frame(scale(hotspot))
#determine number of clusters
#we can see there's an elbow around 4 clusters
wssplot(hotspot, nc=15)
We can see here that there is an elbow around 4, so we can run k means with 4 clusters. With these 4 clusters, we can see that cluster 4 has higher than average auto thefts, and slightly higher robberies, but is below average for the other 3 MCIs. Cluster 1 is a safe part of Toronto with lower crime incidents. Cluster 3 has slightly more assaults, break and enters, and robberies, but lower auto thefts and theft overs. Cluster 2, however, is the most crime-centric area of Toronto, with many more incidents than average, excepting auto thefts. Luckily, there are only 3 hotspots in cluster 2, while there are 12 and 13 in clusters 1 and 3 respectively, which had generally fewer incidents overall.
From the silhouette plot, we can see that clustering based on this data set is better as compared to Clustering strategy I.
km.res <- eclust(hotspot, "kmeans", k = 4, graph = T)
fviz_silhouette(km.res, palette = "jco", ggtheme = theme_classic())
## cluster size ave.sil.width
## 1 1 16 0.22
## 2 2 4 0.22
## 3 3 4 0.24
## 4 4 10 0.46
km.res
## K-means clustering with 4 clusters of sizes 16, 4, 4, 10
##
## Cluster means:
## Assault Auto_Theft Break_and_Enter Robbery Theft_Over
## 1 -0.03088279 -0.2256176 0.3005858 0.02643693 -0.3488020
## 2 0.55292426 2.4255005 -0.5324398 1.12469735 0.5700703
## 3 1.72830722 -0.6195701 1.6301725 1.22286588 2.0263084
## 4 -0.86308013 -0.3613840 -0.9200304 -0.98132438 -0.4804683
##
## Clustering vector:
## [1] 4 1 1 4 1 1 4 3 3 4 4 1 1 1 4 2 2 4 1 1 1 1 1 2 2 3 1 4 3 1 4 1 4 1
##
## Within cluster sum of squares by cluster:
## [1] 22.932432 10.847959 12.848323 6.212482
## (between_SS / total_SS = 68.0 %)
##
## Available components:
##
## [1] "cluster" "centers" "totss" "withinss"
## [5] "tot.withinss" "betweenss" "size" "iter"
## [9] "ifault" "clust_plot" "silinfo" "nbclust"
## [13] "data"
#plotting k-means
clusplot(hotspot, km.res$cluster, main='2D representation of the Cluster solution',
color=TRUE, shade=TRUE,
labels=2, lines=0)
#pc <-princomp(hotspot, cor=TRUE, scores=TRUE)
#plot3d(pc$scores[,1:3], col=k.means.fit$cluster, main="k-means clusters")
hotspot$cluster <- factor(km.res$cluster)
ggmap(tormap) +
geom_point( data= torclus, aes(x=Long[], y=Lat[], color = hotspot$cluster)) +
theme_void() + coord_map()
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
When we plot the clusters using principal component analysis in 2D, we can see that the clusters are relatively well separated. Cluster 1 and 4 are spread out, but clusters 2 and 3 are tighter. On the map, we can also see that cluster 4 is concentrated in downtown Toronto, while cluster 1 is in the northwest part.
#coerce data for Divisions
bygroup <- group_by(data, MCI, Division)
groups <- dplyr::summarise(bygroup, n=n())
groups <- groups[c("Division", "MCI", "n")]
div <- as.data.frame(spread(groups, key=MCI, value=n))
div_name = div[,1]
div <- div[, -1]
#normalize
for(col in names(div)) {
div[,col] = (div[,col] - mean(div[,col])) / sd(div[,col])
}
#determine number of clusters
#we can see there's an elbow around 3 clusters
wssplot(div, nc=15)
# k-means
km.res <- eclust(div, "kmeans", k = 3, graph = T)
fviz_silhouette(km.res, palette = "jco", ggtheme = theme_classic())
## cluster size ave.sil.width
## 1 1 8 0.46
## 2 2 9 0.21
## 3 3 17 0.96
km.res
## K-means clustering with 3 clusters of sizes 8, 9, 17
##
## Cluster means:
## Assault Auto_Theft Break_and_Enter Robbery Theft_Over
## 1 0.4761876 0.1302555 0.5495307 0.3599759 0.5014688
## 2 1.2770657 1.2633142 1.1990178 1.3373206 1.1973892
## 3 -0.9001819 -0.7301101 -0.8933768 -0.8773937 -0.8698972
##
## Clustering vector:
## [1] 1 3 1 3 1 3 2 3 2 3 2 3 2 3 2 3 1 3 2 3 2 3 2 3 2 3 1 3 1 3 1 3 1 3
##
## Within cluster sum of squares by cluster:
## [1] 6.9416751 17.1183040 0.1890451
## (between_SS / total_SS = 85.3 %)
##
## Available components:
##
## [1] "cluster" "centers" "totss" "withinss"
## [5] "tot.withinss" "betweenss" "size" "iter"
## [9] "ifault" "clust_plot" "silinfo" "nbclust"
## [13] "data"
#similar to neighborhoods, two clusters have low crime incidents (1 and 2), while cluster 3 has higher crime incidents
# most districts are lower crime incident districts, while 9 specifically are higher
#can we map this and see if they correspond to neighborhoods?
#do the Toronto police have a dedicated district to each high crime neighborhood?
Mapping this is probably not possible
division_vs_hood_id = data[,c("Division","Hood_ID")]
table(division_vs_hood_id)
## Hood_ID
## Division 1 2 3 4 5 6 7 8 9 10 11 12 13
## D11 0 0 0 0 0 0 0 0 0 0 0 0 0
## D11 0 0 0 0 0 0 0 0 0 0 0 0 0
## D12 0 0 0 0 0 0 0 0 0 0 0 0 0
## D12 0 0 0 0 0 0 0 0 0 0 0 0 0
## D13 0 0 0 0 0 1 0 0 0 0 0 0 0
## D13 0 0 0 0 0 0 0 0 0 0 0 0 0
## D14 0 0 0 0 0 0 0 0 0 0 0 0 0
## D14 0 0 0 0 0 0 0 0 0 0 0 0 0
## D22 0 0 0 0 0 0 38 28 348 266 469 218 323
## D22 0 0 0 0 0 0 5 2 15 15 30 7 15
## D23 2774 1401 402 395 355 753 653 326 0 0 0 0 0
## D23 173 117 27 34 30 46 41 34 0 0 0 0 0
## D31 0 0 0 0 0 0 0 0 0 0 0 0 0
## D31 0 0 0 0 0 0 0 0 0 0 0 0 0
## D32 0 0 0 0 0 0 0 0 0 0 0 0 0
## D32 0 0 0 0 0 0 0 0 0 0 0 0 0
## D33 0 0 0 0 0 0 0 0 0 0 0 0 0
## D33 0 0 0 0 0 0 0 0 0 0 0 0 0
## D41 0 0 0 0 0 0 0 0 0 0 0 0 0
## D41 0 0 0 0 0 0 0 0 0 0 0 0 0
## D42 0 0 0 0 0 0 0 0 0 0 0 0 0
## D42 0 0 0 0 0 0 0 0 0 0 0 0 0
## D43 0 0 0 0 0 0 0 0 0 0 0 0 0
## D43 0 0 0 0 0 0 0 0 0 0 0 0 0
## D51 0 0 0 0 0 0 0 0 0 0 0 0 0
## D51 0 0 0 0 0 0 0 0 0 0 0 0 0
## D52 0 0 0 0 0 0 0 0 0 0 0 0 0
## D52 0 0 0 0 0 0 0 0 0 0 0 0 0
## D53 0 0 0 0 0 0 0 0 0 0 0 0 0
## D53 0 0 0 0 0 0 0 0 0 0 0 0 0
## D54 0 0 0 0 0 0 0 0 0 0 0 0 0
## D54 0 0 0 0 0 0 0 0 0 0 0 0 0
## D55 0 0 0 0 0 0 0 0 0 0 0 0 0
## D55 0 0 0 0 0 0 0 0 0 0 0 0 0
## Hood_ID
## Division 14 15 16 17 18 19 20 21 22 23 24 25 26
## D11 0 0 0 1 0 0 0 0 0 0 0 0 0
## D11 0 0 0 0 0 0 0 0 0 0 0 0 0
## D12 0 0 0 0 0 0 0 0 0 221 0 0 0
## D12 0 0 0 0 0 0 0 0 0 9 0 0 0
## D13 0 0 0 0 0 0 0 0 0 0 0 0 0
## D13 0 0 0 0 0 0 0 0 0 0 0 0 0
## D14 0 0 0 1 0 0 0 0 0 0 0 0 0
## D14 0 0 0 0 0 0 0 0 0 0 0 0 0
## D22 1747 321 653 1242 569 378 312 0 0 0 0 0 0
## D22 70 7 37 77 36 23 14 0 0 0 0 0 0
## D23 0 0 0 0 0 0 0 4 0 0 0 0 0
## D23 0 0 0 0 0 0 0 2 0 0 0 0 0
## D31 0 0 0 0 0 0 0 1029 586 196 1289 1284 1685
## D31 0 0 0 0 0 0 0 43 34 10 55 44 74
## D32 0 0 0 0 0 0 0 0 0 0 0 0 206
## D32 0 0 0 0 0 0 0 0 0 0 0 0 6
## D33 0 0 0 0 0 0 0 0 0 0 0 0 0
## D33 0 0 0 0 0 0 0 0 0 0 0 0 0
## D41 0 0 0 0 0 0 0 0 0 0 0 0 0
## D41 0 0 0 0 0 0 0 0 0 0 0 0 0
## D42 0 0 0 0 0 0 0 0 0 0 0 0 0
## D42 0 0 0 0 0 0 0 0 0 0 0 0 0
## D43 0 0 0 0 0 0 0 0 0 0 0 0 0
## D43 0 0 0 0 0 0 0 0 0 0 0 0 0
## D51 0 0 0 0 0 0 0 0 0 0 0 0 0
## D51 0 0 0 0 0 0 0 0 0 0 0 0 0
## D52 0 0 0 0 0 0 0 0 0 0 0 0 0
## D52 0 0 0 0 0 0 0 0 0 0 0 0 0
## D53 0 0 0 0 0 0 0 0 0 0 0 0 0
## D53 0 0 0 0 0 0 0 0 0 0 0 0 0
## D54 0 0 0 0 0 0 0 0 0 0 0 0 0
## D54 0 0 0 0 0 0 0 0 0 0 0 0 0
## D55 0 0 0 0 0 0 0 0 0 0 0 0 0
## D55 0 0 0 0 0 0 0 0 0 0 0 0 0
## Hood_ID
## Division 27 28 29 30 31 32 33 34 35 36 37 38 39
## D11 0 0 0 0 0 0 0 0 0 0 0 0 0
## D11 0 0 0 0 0 0 0 0 0 0 0 0 0
## D12 0 480 298 558 0 0 0 0 0 0 0 0 0
## D12 0 14 12 15 0 0 0 0 0 0 0 0 0
## D13 0 0 0 0 351 148 0 0 0 0 0 0 33
## D13 0 0 0 0 11 5 0 0 0 0 0 0 0
## D14 0 0 0 0 0 0 0 0 0 0 0 0 0
## D14 0 0 0 0 0 0 0 0 0 0 0 0 0
## D22 0 0 0 0 0 0 0 0 0 0 0 0 0
## D22 0 0 0 0 0 0 0 0 0 0 0 0 0
## D23 0 0 0 0 0 0 0 0 0 0 0 0 0
## D23 0 0 0 0 0 0 0 0 0 0 0 0 0
## D31 1665 0 0 0 0 0 0 0 0 0 0 0 0
## D31 68 0 0 0 0 0 0 0 0 0 0 0 0
## D32 484 0 0 0 682 451 526 609 575 705 358 438 592
## D32 9 0 0 0 28 15 8 18 12 25 10 11 22
## D33 0 0 0 0 0 0 0 0 0 0 0 0 0
## D33 0 0 0 0 0 0 0 0 0 0 0 0 0
## D41 0 0 0 0 0 0 0 0 0 0 0 0 0
## D41 0 0 0 0 0 0 0 0 0 0 0 0 0
## D42 0 0 0 0 0 1 0 0 0 0 0 0 0
## D42 0 0 0 0 0 0 0 0 0 0 0 0 0
## D43 0 0 0 0 0 0 0 0 0 0 0 0 0
## D43 0 0 0 0 0 0 0 0 0 0 0 0 0
## D51 0 0 0 0 0 0 0 0 0 0 0 0 0
## D51 0 0 0 0 0 0 0 0 0 0 0 0 0
## D52 0 0 0 0 0 0 0 0 0 0 0 0 0
## D52 0 0 0 0 0 0 0 0 0 0 0 0 0
## D53 0 0 0 0 0 0 0 0 0 0 0 0 169
## D53 0 0 0 0 0 0 0 0 0 0 0 0 3
## D54 0 0 0 0 0 0 0 0 0 0 0 0 0
## D54 0 0 0 0 0 0 0 0 0 0 0 0 0
## D55 0 0 0 0 0 0 0 0 0 0 0 0 0
## D55 0 0 0 0 0 0 0 0 0 0 0 0 0
## Hood_ID
## Division 40 41 42 43 44 45 46 47 48 49 50 51 52
## D11 0 0 0 0 0 0 0 0 0 0 0 0 0
## D11 0 0 0 0 0 0 0 0 0 0 0 0 0
## D12 0 0 0 0 0 0 0 0 0 0 0 0 0
## D12 0 0 0 0 0 0 0 0 0 0 0 0 0
## D13 0 0 0 0 0 0 0 0 0 0 0 0 0
## D13 0 0 0 0 0 0 0 0 0 0 0 0 0
## D14 0 0 0 0 0 0 0 0 0 0 0 0 0
## D14 0 0 0 0 0 0 0 0 0 0 0 0 0
## D22 0 0 0 0 0 0 0 0 0 0 0 0 0
## D22 0 0 0 0 0 0 0 0 0 0 0 0 0
## D23 0 0 0 0 0 0 0 0 0 0 0 0 0
## D23 0 0 0 0 0 0 0 0 0 0 0 0 0
## D31 0 0 0 0 0 0 0 0 0 0 0 0 0
## D31 0 0 0 0 0 0 0 0 0 0 0 0 0
## D32 328 118 0 0 0 0 0 0 0 5 881 1186 32
## D32 3 0 0 0 0 0 0 0 0 0 24 52 0
## D33 335 46 633 194 0 946 290 885 457 296 0 1 531
## D33 9 3 25 6 0 33 6 26 12 14 0 0 14
## D41 0 0 0 0 0 0 0 0 0 0 0 0 0
## D41 0 0 0 0 0 0 0 0 0 0 0 0 0
## D42 0 0 0 0 0 0 0 0 0 0 0 0 0
## D42 0 0 0 0 0 0 0 0 0 0 0 0 0
## D43 0 0 0 0 0 0 0 0 0 0 0 0 0
## D43 0 0 0 0 0 0 0 0 0 0 0 0 0
## D51 0 0 0 0 0 0 0 0 0 0 0 0 0
## D51 0 0 0 0 0 0 0 0 0 0 0 0 0
## D52 0 0 0 0 0 0 0 0 0 0 0 0 0
## D52 0 0 0 0 0 0 0 0 0 0 0 0 0
## D53 0 125 0 0 0 0 0 0 0 0 0 0 0
## D53 0 8 0 0 0 0 0 0 0 0 0 0 0
## D54 0 0 3 216 652 0 0 0 0 0 0 0 0
## D54 0 0 0 9 33 0 0 0 0 0 0 0 0
## D55 0 0 0 0 0 0 0 0 0 0 0 0 0
## D55 0 0 0 0 0 0 0 0 0 0 0 0 0
## Hood_ID
## Division 53 54 55 56 57 58 59 60 61 62 63 64 65
## D11 0 0 0 0 0 0 0 0 0 0 0 0 0
## D11 0 0 0 0 0 0 0 0 0 0 0 0 0
## D12 0 0 0 0 0 0 0 0 0 0 0 0 0
## D12 0 0 0 0 0 0 0 0 0 0 0 0 0
## D13 0 0 0 0 0 0 0 0 0 0 0 0 0
## D13 0 0 0 0 0 0 0 0 0 0 0 0 0
## D14 0 0 0 0 0 0 0 0 0 0 0 0 0
## D14 0 0 0 0 0 0 0 0 0 0 0 0 0
## D22 0 0 0 0 0 0 0 0 0 0 0 0 0
## D22 0 0 0 0 0 0 0 0 0 0 0 0 0
## D23 0 0 0 0 0 0 0 0 0 0 0 0 0
## D23 0 0 0 0 0 0 0 0 0 0 0 0 0
## D31 0 0 0 0 0 0 0 0 0 0 0 0 0
## D31 0 0 0 0 0 0 0 0 0 0 0 0 0
## D32 0 0 0 0 0 0 0 0 0 0 0 0 0
## D32 0 0 0 0 0 0 0 0 0 0 0 0 0
## D33 315 0 0 0 0 0 0 0 0 0 0 0 0
## D33 20 0 0 0 0 0 0 0 0 0 0 0 0
## D41 0 6 0 0 0 0 0 0 0 0 1 0 0
## D41 0 0 0 0 0 0 0 0 0 0 0 0 0
## D42 0 0 0 0 0 0 0 0 0 0 0 0 0
## D42 0 0 0 0 0 0 0 0 0 0 0 0 0
## D43 0 0 0 0 0 0 0 0 0 0 0 0 0
## D43 0 0 0 0 0 0 0 0 0 0 0 0 0
## D51 0 0 0 0 0 0 0 0 0 0 0 0 0
## D51 0 0 0 0 0 0 0 0 0 0 0 0 0
## D52 0 0 0 0 0 0 0 0 0 0 0 0 0
## D52 0 0 0 0 0 0 0 0 0 0 0 0 0
## D53 0 0 496 330 0 0 0 0 0 0 0 0 0
## D53 0 0 23 14 0 0 0 0 0 0 0 0 0
## D54 0 661 0 0 209 373 297 294 575 297 0 0 0
## D54 0 27 0 0 8 23 19 10 24 18 0 0 0
## D55 0 0 0 0 0 0 0 0 1 847 559 421 580
## D55 0 0 0 0 0 0 0 0 0 31 23 23 17
## Hood_ID
## Division 66 67 68 69 70 71 72 73 74 75 76 77 78
## D11 0 0 0 0 0 0 0 0 0 0 0 1 0
## D11 0 0 0 0 0 0 0 0 0 0 0 0 0
## D12 0 0 0 0 0 0 0 0 0 1 0 2 0
## D12 0 0 0 0 0 0 0 0 0 0 0 0 0
## D13 0 0 0 0 0 0 0 0 0 0 0 0 0
## D13 0 0 0 0 0 0 0 0 0 0 0 0 0
## D14 0 0 0 0 0 0 0 0 0 0 0 1153 1050
## D14 0 0 0 0 0 0 0 0 0 0 0 45 36
## D22 0 0 0 1 0 0 0 0 0 0 0 1 0
## D22 0 0 0 0 0 0 0 0 0 0 0 0 0
## D23 0 0 0 0 0 0 0 0 0 0 0 0 0
## D23 0 0 0 0 0 0 0 0 0 0 0 0 0
## D31 0 0 0 0 0 0 0 0 0 0 0 0 0
## D31 0 0 0 0 0 0 0 0 0 0 0 0 0
## D32 0 0 0 0 0 0 0 0 0 0 0 0 0
## D32 0 0 0 0 0 0 0 0 0 0 0 0 0
## D33 0 0 0 0 0 0 0 0 0 0 0 0 0
## D33 0 0 0 0 0 0 0 0 0 0 0 0 0
## D41 0 0 0 0 0 0 0 0 0 0 0 0 0
## D41 0 0 0 0 0 0 0 0 0 0 0 0 0
## D42 0 0 0 0 0 0 0 0 0 0 0 0 0
## D42 0 0 0 0 0 0 0 0 0 0 0 0 0
## D43 0 0 0 0 0 0 0 0 0 0 0 0 0
## D43 0 0 0 0 0 0 0 0 0 0 0 0 0
## D51 0 0 0 0 124 752 635 2213 981 2713 0 539 0
## D51 0 0 0 0 12 23 27 92 49 110 0 19 0
## D52 0 0 0 0 0 0 0 0 0 1099 2246 1840 929
## D52 0 0 0 0 0 0 0 0 0 57 106 86 40
## D53 0 0 0 0 0 0 0 0 0 0 0 0 0
## D53 0 0 0 0 0 0 0 0 0 0 0 0 0
## D54 576 382 0 0 0 0 0 0 0 0 0 0 0
## D54 22 12 0 0 0 0 0 0 0 0 0 0 0
## D55 196 101 358 295 1122 0 0 0 0 0 0 0 0
## D55 5 4 7 9 42 0 0 0 0 0 0 0 0
## Hood_ID
## Division 79 80 81 82 83 84 85 86 87 88 89 90 91
## D11 0 0 0 0 249 122 41 710 445 697 368 521 284
## D11 0 0 0 0 9 6 3 35 14 18 13 26 13
## D12 0 0 0 0 0 0 0 0 0 0 0 132 177
## D12 0 0 0 0 0 0 0 0 0 0 0 2 3
## D13 0 0 0 0 0 0 0 0 0 0 0 0 0
## D13 0 0 0 0 0 0 0 0 0 0 0 0 0
## D14 546 552 819 908 128 336 803 211 0 0 0 0 0
## D14 18 12 29 26 2 18 32 4 0 0 0 0 0
## D22 0 0 0 0 0 0 0 0 0 0 0 0 0
## D22 0 0 0 0 0 0 0 0 0 0 0 0 0
## D23 0 0 0 0 0 0 0 0 0 0 0 0 0
## D23 0 0 0 0 0 0 0 0 0 0 0 0 0
## D31 0 0 0 0 0 0 0 0 0 0 0 0 0
## D31 0 0 0 0 0 0 0 0 0 0 0 0 0
## D32 0 0 0 1 0 0 0 0 0 0 0 0 0
## D32 0 0 0 0 0 0 0 0 0 0 0 0 0
## D33 0 0 0 0 0 0 0 0 0 0 0 0 0
## D33 0 0 0 0 0 0 0 0 0 0 0 0 0
## D41 0 0 0 0 0 0 0 0 0 0 0 0 0
## D41 0 0 0 0 0 0 0 0 0 0 0 0 0
## D42 0 0 0 0 0 0 0 0 0 0 0 0 0
## D42 0 0 0 0 0 0 0 0 0 0 0 0 0
## D43 0 0 2 0 0 0 0 0 1 0 0 0 0
## D43 0 0 0 0 0 0 0 0 0 0 0 0 0
## D51 0 0 0 0 0 0 0 0 0 0 0 0 0
## D51 0 0 0 0 0 0 0 0 0 0 0 0 0
## D52 292 0 0 0 0 0 0 0 0 0 0 0 0
## D52 17 0 0 0 0 0 0 0 0 0 0 0 0
## D53 0 0 0 0 0 0 0 0 0 0 0 0 0
## D53 0 0 0 0 0 0 0 0 0 0 0 0 0
## D54 0 0 0 0 0 0 0 0 0 0 0 0 0
## D54 0 0 0 0 0 0 0 0 0 0 0 0 0
## D55 0 0 0 0 0 0 0 0 0 0 0 0 0
## D55 0 0 0 0 0 0 0 0 0 0 0 0 0
## Hood_ID
## Division 92 93 94 95 96 97 98 99 100 101 102 103 104
## D11 0 803 0 0 0 0 0 0 0 0 0 0 0
## D11 0 32 0 0 0 0 0 0 0 0 0 0 0
## D12 0 0 0 0 0 0 0 0 0 0 0 0 0
## D12 0 0 0 0 0 0 0 0 0 0 0 0 0
## D13 498 94 414 0 238 0 0 0 0 172 233 0 0
## D13 31 2 13 0 11 0 0 0 0 7 12 0 0
## D14 0 633 0 858 0 0 0 0 0 0 0 0 0
## D14 0 28 0 30 0 0 0 0 0 0 0 0 0
## D22 0 0 0 0 0 0 0 0 0 0 0 0 0
## D22 0 0 0 0 0 0 0 0 0 0 0 0 0
## D23 0 0 0 0 0 0 0 0 0 0 0 0 0
## D23 0 0 0 0 0 0 0 0 0 0 0 0 0
## D31 0 0 0 0 0 0 0 0 0 0 0 0 0
## D31 0 0 0 0 0 0 0 0 0 0 0 0 0
## D32 0 0 0 0 0 0 0 0 0 0 0 0 0
## D32 0 0 0 0 0 0 0 0 0 0 0 0 0
## D33 0 0 0 0 0 0 0 0 0 0 0 0 0
## D33 0 0 0 0 0 0 0 0 0 0 0 0 0
## D41 0 0 0 0 0 0 0 0 0 0 0 0 0
## D41 0 0 0 0 0 0 0 0 0 0 0 0 0
## D42 0 0 0 0 0 0 0 0 0 0 0 0 0
## D42 0 0 0 0 0 0 0 0 0 0 0 0 0
## D43 0 0 0 0 0 0 0 0 0 0 0 0 0
## D43 0 0 0 0 0 0 0 0 0 0 0 0 0
## D51 0 0 0 0 0 0 207 0 0 0 0 0 0
## D51 0 0 0 0 0 0 9 0 0 0 0 0 0
## D52 0 0 0 137 0 0 3 0 0 0 0 0 0
## D52 0 0 0 5 0 0 0 0 0 0 0 0 0
## D53 0 0 0 899 71 169 814 263 222 80 133 338 754
## D53 0 0 0 38 2 2 33 11 5 3 2 9 30
## D54 0 0 0 0 0 0 0 0 0 0 0 0 0
## D54 0 0 0 0 0 0 0 0 0 0 0 0 0
## D55 0 0 0 0 0 0 0 0 0 0 0 0 0
## D55 0 0 0 0 0 0 0 0 0 0 0 0 0
## Hood_ID
## Division 105 106 107 108 109 110 111 112 113 114 115 116 117
## D11 0 0 0 0 0 0 36 0 0 122 0 0 0
## D11 0 0 0 0 0 0 1 0 0 3 0 0 0
## D12 0 0 0 1 0 344 781 406 959 0 578 0 0
## D12 0 0 0 0 0 7 32 12 43 0 35 0 0
## D13 0 263 528 721 210 1 0 0 0 0 0 0 0
## D13 0 15 26 30 12 0 0 0 0 0 0 0 0
## D14 0 0 0 0 0 0 0 0 0 0 0 0 0
## D14 0 0 0 0 0 0 0 0 0 0 0 0 0
## D22 0 0 0 0 0 0 0 0 0 0 0 0 0
## D22 0 0 0 0 0 0 0 0 0 0 0 0 0
## D23 0 0 0 0 0 0 0 0 1 0 0 0 0
## D23 0 0 0 0 0 0 0 0 0 0 0 0 0
## D31 0 0 0 0 0 0 0 0 0 0 0 0 0
## D31 0 0 0 0 0 0 0 0 0 0 0 0 0
## D32 314 0 0 0 0 0 0 0 0 0 0 0 0
## D32 8 0 0 0 0 0 0 0 0 0 0 0 0
## D33 0 0 0 0 0 0 0 0 0 0 0 29 53
## D33 0 0 0 0 0 0 0 0 0 0 0 3 0
## D41 0 0 0 0 0 0 0 0 0 0 0 0 0
## D41 0 0 0 0 0 0 0 0 0 0 0 0 0
## D42 0 0 0 0 0 0 0 0 0 0 0 458 981
## D42 0 0 0 0 0 0 0 0 0 0 0 15 27
## D43 0 0 0 0 0 0 0 0 0 0 0 0 0
## D43 0 0 0 0 0 0 0 0 0 0 0 0 0
## D51 0 0 0 0 0 1 0 0 1 0 0 0 0
## D51 0 0 0 0 0 0 0 0 0 0 0 0 0
## D52 0 0 0 0 0 0 0 0 0 0 0 0 0
## D52 0 0 0 0 0 0 0 0 0 0 0 0 0
## D53 20 0 0 0 0 0 0 0 0 0 0 0 0
## D53 0 0 0 0 0 0 0 0 0 0 0 0 0
## D54 0 0 0 0 0 0 0 0 0 0 0 0 0
## D54 0 0 0 0 0 0 0 0 0 0 0 0 0
## D55 0 0 0 0 0 0 0 0 0 0 0 0 0
## D55 0 0 0 0 0 0 0 0 0 0 0 0 0
## Hood_ID
## Division 118 119 120 121 122 123 124 125 126 127 128 129 130
## D11 0 0 0 0 0 0 0 0 0 0 0 0 0
## D11 0 0 0 0 0 0 0 0 0 0 0 0 0
## D12 0 0 0 0 0 0 0 0 0 0 0 0 0
## D12 0 0 0 0 0 0 0 0 0 0 0 0 0
## D13 0 0 0 0 0 0 0 0 0 0 0 0 0
## D13 0 0 0 0 0 0 0 0 0 0 0 0 0
## D14 0 0 0 0 0 0 0 0 0 0 0 0 0
## D14 0 0 0 0 0 0 0 0 0 0 0 0 0
## D22 0 0 0 0 0 0 0 0 0 0 0 0 0
## D22 0 0 0 0 0 0 0 0 0 0 0 0 0
## D23 0 0 0 0 0 0 0 0 0 0 0 0 0
## D23 0 0 0 0 0 0 0 0 0 0 0 0 0
## D31 0 0 0 0 0 0 0 0 0 0 0 0 0
## D31 0 0 0 0 0 0 0 0 0 0 0 0 0
## D32 0 0 0 0 0 0 0 0 0 0 0 0 0
## D32 0 0 0 0 0 0 0 0 0 0 0 0 0
## D33 2 164 0 0 0 0 0 0 0 0 0 0 0
## D33 1 6 0 0 0 0 0 0 0 0 0 0 0
## D41 0 1387 1311 830 921 238 1040 555 1159 640 0 0 0
## D41 0 33 83 35 45 7 37 28 32 23 0 0 0
## D42 658 0 0 0 0 0 0 0 0 0 886 659 962
## D42 33 0 0 0 0 0 0 0 0 0 33 17 21
## D43 0 1 0 0 0 402 0 0 0 805 0 0 0
## D43 0 1 0 0 0 33 0 0 0 59 0 0 0
## D51 0 0 0 0 0 0 0 0 0 0 0 0 0
## D51 0 0 0 0 0 0 0 0 0 0 0 0 0
## D52 0 0 0 0 0 0 0 0 0 0 0 0 0
## D52 0 0 0 0 0 0 0 0 0 0 0 0 0
## D53 0 0 0 0 0 0 0 0 0 0 0 0 0
## D53 0 0 0 0 0 0 0 0 0 0 0 0 0
## D54 0 1 107 18 0 0 1 0 0 0 0 0 0
## D54 0 0 10 0 0 0 0 0 0 0 0 0 0
## D55 0 0 0 82 16 0 0 0 0 0 0 0 0
## D55 0 0 0 3 1 0 0 0 0 0 0 0 0
## Hood_ID
## Division 131 132 133 134 135 136 137 138 139 140
## D11 0 0 0 0 0 0 0 0 0 0
## D11 0 0 0 0 0 0 0 0 0 0
## D12 0 0 0 0 0 0 0 0 0 0
## D12 0 0 0 0 0 0 0 0 0 0
## D13 0 0 0 0 0 0 0 0 0 0
## D13 0 0 0 0 0 0 0 0 0 0
## D14 0 0 0 0 0 0 0 0 0 0
## D14 0 0 0 0 0 0 0 0 0 0
## D22 0 0 0 0 0 0 0 0 0 0
## D22 0 0 0 0 0 0 0 0 0 0
## D23 0 0 0 0 0 0 0 0 0 0
## D23 0 0 0 0 0 0 0 0 0 0
## D31 0 0 0 0 0 0 0 0 0 0
## D31 0 0 0 0 0 0 0 0 0 0
## D32 0 0 0 0 0 0 0 0 0 0
## D32 0 0 0 0 0 0 0 0 0 0
## D33 0 0 0 0 0 0 0 0 0 0
## D33 0 0 0 0 0 0 0 0 0 0
## D41 0 0 0 0 0 0 0 543 0 0
## D41 0 0 0 0 0 0 0 16 0 0
## D42 1033 1283 1 0 0 0 0 0 0 0
## D42 41 58 0 0 0 0 0 0 0 0
## D43 177 0 200 549 561 1741 2011 627 795 266
## D43 9 0 11 36 44 124 125 41 41 12
## D51 0 0 0 0 0 0 0 0 0 0
## D51 0 0 0 0 0 0 0 0 0 0
## D52 0 0 0 0 0 0 0 0 0 0
## D52 0 0 0 0 0 0 0 0 0 0
## D53 0 0 0 0 0 0 0 0 0 0
## D53 0 0 0 0 0 0 0 0 0 0
## D54 0 0 0 0 0 0 0 0 0 0
## D54 0 0 0 0 0 0 0 0 0 0
## D55 0 0 0 0 0 0 0 0 0 0
## D55 0 0 0 0 0 0 0 0 0 0